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A STOCHASTIC APPROXIMATION ALGORITHM FOR 
STOCHASTIC SEMIDEFINITE PROGRAMMING 

BRUNO GAUJAL AND PANAYOTIS MERTIKOPOULOS 


Abstract. Motivated by applications to multi-antenna wireless networks, we 
propose a distributed and asynchronous algorithm for stochastic semidefinite 
programming. This algorithm is a stochastic approximation of a continous- 
time matrix exponential scheme regularized by the addition of an entropy-like 
term to the problem’s objective function. We show that the resulting algo¬ 
rithm converges almost surely to an ^-approximation of the optimal solution 
requiring only an unbiased estimate of the gradient of the problem’s stochastic 
objective. When applied to throughput maximization in wireless multiple- 
input and multiple-output (MIMO) systems, the proposed algorithm retains 
its convergence properties under a wide array of mobility impediments such 
as user update asynchronicities, random delays and/or ergodically changing 
channels. Our theoretical analysis is complemented by extensive numerical 
simulations which illustrate the robustness and scalability of the proposed 
method in realistic network conditions. 


1. Introduction 

Semidefinite programming (i.e. the minimization of a convex function over a 
convex subset of the cone of positive-semidefinite matrices) comprises a rich class 
of convex optimization problems that is both relatively tractable (interior-point 
methods can often be used with polynomial worst-case complexity [27]) and also 
very powerful (many optimization problems in engineering and combinatorial opti¬ 
mization can be recast as semidefinite programs [41]). Especially in an engineering 
context however, many applications involve a certain degree of randomness (either 
in the objective function itself or in the feedback provided to the optimizer) [15, 43] 
so many standard semidefinite optimization algorithms cannot be applied “off the 
shelf”. For instance, minimum volume convering problems (where quadratic func¬ 
tions can be expressed as semidefinite constraints) have been a very active research 
topic for the last fifty years [39]. In wireless telecommunications, transmission 
ranges of mobile devices have also been modeled as Euclidean balls with random 
parameters, hence expressible via semidefinite constraints with stochastic perturba¬ 
tions; as a result, route discovery in mobile ad-hoc networks is typically addressed 
using stochastic semidefinite programming approaches [45]. In view of the above, 
we focus in this paper on stochastic semidefinite programming, a subclass of semi¬ 
definite programs where the objective function is given in the form of a stochastic 
expectation, with possibly unknown randomness. 
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In this framework, there are two main algorithmic approaches. In the “offline” 
approach, it is assumed that the optimizing agent (or agents in the case of multi¬ 
agent optimization) knows the stochastic expectation of his objective function in 
some (semi-)explicit form (possibly quite complicated) and tries to optimize it by 
calling an appropriate semidefinite optimization algorithm. On the other hand, in 
the “online” approach to optimization, the functional form of the objective function 
(and any inherent randomness) is unknown and the agent seeks to optimize his 
objective based on indirect (and possibly imperfect) performance indicators. The 
former approach is usually employed in large-scale industrial optimization problems 
where the collection of data is not costly, but their processing is; instead, the latter 
approach applies to distributed optimization problems in complex systems (such 
as networks) where the optimizing agents are not capable of collecting a lot of 
optimization data - but the agents have the computing power to handle the data 
they collect. 

Motivated by applications to wireless networks, our paper adopts the second 
approach with the aim of proposing a fully distributed algorithm for stochastic 
multi-agent semidefinite optimization problems that requires minimal (and possi¬ 
bly imperfect) gradient information; and a) it is fully parallelizable and does not 
require any coordination between the optimizing agents. This algorithm is obtained 
as a variable step-size stochastic approximation [7, 8] of a continuous-time matrix 
exponential learning scheme which has important ties to the mirror descent ma¬ 
chinery of [17, 25, 26]. In contrast to mirror descent methods however, we establish 
the convergence of the algorithm’s last iterate and not only the convergence of its 
empirical time-average, properly weighed by the step-size sequence employed. In 
applications to wireless mobile systems, this is crucial because it implies the con¬ 
vergence of the network to a stable, optimum state in a strong sense instead of a 
weaker, average sense. 

To complement our abstract theoretical analysis (Sections 2 and 3), we also 
present a concrete application to multi-antenna wireless mobile networks with er- 
godically changing channel conditions. As explained in Section 4, this case fits 
squarely within the core stochastic semidefinite programming framework of Section 
2: First, this is due to the problem’s inherently distributed aspect (since it is often 
impossible - or imractical - to coordinate and/or synchronize the mobile users’ 
updates), and, second, due to the lack of full system information at the user end 
and the fact that users do not necessarily know the stochastic law of their channels. 

The users’ objective in this setting is to maximize their information transmis¬ 
sion rate by optimizing the covariance matrix of their input signal distribution. 
Two cases are considered. First, we consider the case where the users have perfect 
feedback from the receiver but their channels evolve following a stationary, ergodic 
process (the fast-fading regime) [15]; in this case, the users’ transmission rate is 
the stochastic average of their achievable rate over all channel realizations and the 
problem boils down to a multi-agent stochastic semidefinite program. The second 
case concerns static channel conditions (i.e. the wireless medium is assumed to 
evolve at a much slower rate than the transmitters’ update time-scale). In this 
case, a major challenge arises if the users only have access to imperfect receiver 
feedback and channel state information; thus, even though the underlying prob¬ 
lem is deterministic, stochasticity arises from the noise in the users’ measurements 
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and observations. A partial description of our method applied to this “imperfect 
information” case was presented at our earlier conference paper [12]. 

In both cases, we show how the proposed algorithmic scheme can be implemented 
in both synchronous and asynchronous ways. Additionally, we also provide a pro¬ 
cedure to compute an unbiased estimator of the gradient of the transmission rate 
for each transmitter via receiver-transmitter reciprocity. Finally, we also provide a 
suite of numerical simulations to illustrate the robustness of our algorithm and to 
compare it to more traditional water-filling techniques (which it outperforms). 

2. Problem Formulation and Preliminaries 

As we mentioned in the introduction, our main goal is to provide an efficient 
and robust solution method for semidefinite optimization problems where the ob¬ 
jective function depends on a controlled matrix variable X and a random variable 
w (that cannot be controlled by the optimizer). More precisely, we consider prob¬ 
lems where, through repeated iterations, the optimizing agent (or agent for short) 
seeks to converge to a value of X that optimizes the expected value of the objec¬ 
tive function with respect to ui (i.e. that solves the agent’s stochastic optimization 
problem “on average”). Obviously, if the agent’s “mean” objective function can be 
calculated explicitly, the above boils down to a deterministic problem; however, a 
major challenge occurs if this expectation cannot be calculated - or, worse, if the 
distribution of u is not even known to begin with. 

The above problem will comprise the core of our considerations and we will 
formalize it in the following section; a variant formulation for multi-agent environ¬ 
ments is then provided in Section 2.2. From a mathematical point of view, both 
models are essentially equivalent but, from a practical standpoint, they describe 
problems of a very different nature. 

2.1. The core problem. Let Hm = {X € C MxM : X = X^} denote the space 
of M x M Hermitian matrices and let X = {X 0 : tr(X) = 1} denote the 
spectrahedron of positive-semidefinite matrices with unit trace. In what follows, 
we will focus on the stochastic semidefinite optimization problem: 

minimize E [/ (X; ui 
subject to X S X, 

where u> is an abstract random variable taking values in some probability space f2, 
the expectation E]-] is taken with respect to the law of w, and f: X x Q —>■ M is a 
smooth random function which is convex with respect to X £ X for all to £ f l. 

Importantly, the simple formulation (SSP) above accounts for a fairly wide class 
of stochastic optimization problems over compact spectrahedra (the semidefinite 
equivalent of polytopes, either real or complex). In fact, as long as the feasible 
region X' of a semidefinite program is a spectrahedron that is invariant under 
unitary transformations of the form X i-a UXUl for all unitary matrices U, 1 
optimizing a convex function over X' boils down to optimizing a convex function 
over X (at the cost of increasing the problem’s dimensionality) [29]. As such, (SSP) 
can be seen as a canonical form for stochastic optimization problems over compact, 
unitary-invariant spectrahedra. 

1 Recall here that a complex matrix U is unitary if and only if UU' = TJ' U = I. For the real 
case, invariance need only hold over all orthogonal matrices O such that OO t = O t O = I. 


(SSP) 
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In the above framework, the mean objective function 

F(X)=E[/(X;w)] (2.1) 

is itself convex over X ; for simplicity, we will also assume that F is finite and smooth 
over X. In this way, we obtain the (convex) semidefinite optimization problem 

minimize F (X), 

(SP) 

subject to XeAT, 

which could be solved by standard convex programming methods, provided that F 
is known to the optimizer. As such, the main difficulty in solving (SSP)/(SP) is 
precisely that the law of ui may not be known, in which case the functional form 
of F is also unknown. To circumvent this difficulty, standard results in convex 
analysis [38] show that the gradient matrix 

V(X)=V x A(X) (2.2) 

of the mean objective function F may be calculated by interchanging differentiation 
with expectation, i.e. 

V x F(X) = E[V X /(X; w)[ for all XgX. (2.3) 

Thus, following the stochastic approximation approach of [25, 26, 41], we will fo¬ 
cus on solving (SSP) based only on random (sample-dependent) estimates of the 
stochastic gradient matrices Vx /(X; w). 2 

To make this precise, let X(l), X(2),..., be a (possibly random) sequence of play 
by the optimizing agent - that is, at stage n, the agent chooses X(n) and incurs 
an expected cost of F(X(n)). Then, at each stage n = 1,2,..., we will assume 
that the agent has access to a random matrix V(n) which satisfies the statistical 
unbiasedness hypothesis: 

Assumption 1 . V (n) is a uniformly bounded random variable such that 

E[V(n)| F n \ = V(X(n)) for all n = 1,2,..., (Al) 

where T n denotes the filtration induced by the history process X(n). 

The statistical hypothesis above allow us to account for a very wide range of 
estimation oracles: in particular, we will not be assuming independent and iden¬ 
tically distributed (i.i.d.) observations (a feature which is crucial in the context 
of wireless networks where observations are typically correlated with the state of 
the system). Instead, we will only assume there is an oracle mechanism that re¬ 
turns a .Fn-measurable estimate V(n) of V(X(n)) once the agent plays X(n); the 
construction of such an oracle for specific applications will be detailed in Section 4. 

Remark 2.1. An important special case of the problem (SSP) is when the expec¬ 
tation in (2.1) is deterministic , i.e. = /(-,u/) for almost every o o,ui' £ Q. 

In that case, Assumption (Al) accounts for problems where the optimizer is called 
to solve a deterministic semidefinite program with imperfect gradient feedback and 
stochasticity stems from the random noise perturbing the agent’s observations. 
More generally, depending on the structure of the probability space f2, the ran¬ 
domness in the stochastic optimization problem (SSP) and the randomness in the 

2 For posterity, we note here that Vx/(X;oj) is Hermitian (on account of the fact that / is 
real). 
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gradient observations V(n) could be completely decoupled; the only assumption 
that we will make regarding these different degrees of randomness is (Al). 

2.2. Multi-agent optimization and games. In multi-agent environments, we 
assume that there are multiple optimizing agents k = 1,..., K, each one controlling 
an individual control variable X k that impacts the agents’ global objective / in a 
different way. Specifically, this amounts to the following multi-agent version of 
(SSP): 

minimize E [/(X .. Xa';w)], (2 4) 

subject to X*; £ X k , 

where X k = {X*. £ C MtxM ‘ : X^ 0, tr(Xj.) = 1} denotes the feasible region 
of agent k and /: Y\ k X k x fl —► K. satisfies the same convexity and smoothness 
assumptions as before. 

In this setting, if there is no central controller to coordinate the agents’ actions 
and provide global feedback, it will be assumed that agents can only access an 
estimate ~V k of their individual gradient matrices 

Vfc(X) = V Xfc F(X) (2.5) 

where X = (Xi denotes the agents’ aggregate action profile and F(X) = 

E[/(X;w)]. Thus, mutatis mutandis, we will assume that Assumption (Al) applies 
to each agent separately, and we will seek to provide a distributed optimization 
algorithm that solves (2.4) under these assumptions. 

As a further extension of the above framework, we will also consider the case 
where each agent seeks to minimize unilaterally an individual objective function f k 
(i.e. there is no global objective). This situation is known as a game in normal 
form (or, more simply, a game ) and the solution concept that we will focus on 
is that of Nash equilibrium [14, 23, 24, 30]. Formally, we will say that an action 
profile X* = (X*,..., X* K ) is a Nash equilibrium of the game induced by the mean 
individual objective functions F k (X) = E[//-(X; w)] when 

F k (X*) < F k (X k -X*_ k ) (NE) 

for every unilateral deviation X k £ X k and for every agent k = 1 ,.,.,/v, with 
(Xfe; X^ fe ) denoting the tuple (XJ,..., X k ,..., Xj,). Put differently, Nash equilib¬ 
ria are simply action profiles which are unilaterally stable in that no agent has any 
incentive to deviate from them. 

The connection between game theory and distributed optimization is recovered 
in the class of potential games [23], i.e. games where the players’ mean objective 
functions are aligned along a common potential function F. More precisely, fol¬ 
lowing Monderer and Shapley [23] , we will say that A is a potential function for a 
game with mean objectives F k when 

F k (X k \ X_fc) — F k (X' k -, X_fc) = F(X k ; X_ k ) — F k (X' k -, X_ k ) (2.6) 

for all actions X k ,X' k £ X k of agent k, and for all action profiles X_ k £ X_ k = 
of fc’s opponents. As can be easily seen, if a game admits a potential func¬ 
tion, its Nash equilibria necessarily coincide with the critical points of its potential 
function [23]. Thus, if the game’s potential F is convex over X = X k , it follows 
that the equilibrium problem (NE) can be reduced to the distributed optimization 
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Algorithm 1 Algorithmic implementation of (DXL). 

Parameters: discount parameter r > 0; decreasing step-size sequence 7 n . 
Initialize: n <— 0; Y t—0; X t—exp(Y)/tr[exp(Y)]; 

Repeat 

foreach agent k £ K. do 
get gradient estimate V; 

update score matrix: Y •<— Y-7„(v + ty); 

update primal variable: X £- exp(Y)/ tr[exp(Y)]; 

_ n £- n + 1 ; 

until termination criterion is reached. 


problem (2. 4 ). 3 We will use this observation freely throughout our paper - and, 
especially, in Section 4. 


3. Algorithms and Results 


3.1. Single-agent optimization analysis. The main algorithmic scheme that we 
will use to solve (SSP)/(SP) will be based on the following discounted exponential 
learning (DXL) recursion: 


Y(n+1) 
X(n + 1) 


Y(n) - 7n (V(n) + rY(n)) , 

exp( Y [n + 1)) 
tr[exp(Y(n + 1))] ’ 


(DXL) 


where: 

( 1 ) Y (n) is an auxiliary scoring matrix which aggregates gradient information. 

(2) V (n) is a random matrix variable satisfying the unbiasedness assumption 

(Al). 

(3) 7 n , n = 1,2,..., is a nonincreasing step-size sequence (specific assumptions 
for 7 „ will be discussed below). 

(4) r > 0 is a (small) discount parameter which acts as a failsafe against the 
iterates Y(n) getting out of bounds. 

Intuitively, (DXL) acts as a “regularized” stochastic gradient descent process: if 
we ignore the parameter r for the moment, each iteration of (DXL) simply aggre¬ 
gates the received gradient information (in the update of Y) and then “projects” 
back to the primal variable X to receive a new gradient and continue the pro¬ 
cess. The reason that the exponentiation step acts as a “projection” operator is 
that it aligns the eigenvalues of X with those of Y, so, in a certain sense, X is an 
“exponential projection” of Y to X. 

The role of the discount parameter r in (DXL) (and the reason for calling it a 
“discount” in the first place) is more subtle. To understand it, note first that the 


3 ConverseIy, every distributed optimization problem can be seen as a potential game by setting 
fk = f f° r all k. 
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recursive step of (DXL) can be rewritten in aggregate form as: 

n 

Y(n + 1) = e~ T ^Y(l) - £ e~ T ^ lm V(rn), (3.1) 

771=1 

where, assuming that 7 „ is small enough, we have set: 

n 

T n ,m = ^2 k’St 1 - T 7j)- (3-2) 

j=m 

By expanding the logarithm to leading order in (3.2), this last sum is asymptotically 
equal to —Tt n rn where 

n 

(3.3) 

j=m 

Accordingly, to leading order, (3.1) can be rewritten for large n (and small 7 n ) as: 

n 

Y(n + 1) « r* 71 ’ 1 Y(l) — ^ r tnim+lr y m V ( m ), (3.4) 

771=1 

with f ram given by (3.3) and r = exp(—r). 

Of course, the above derivation is approximate in nature but it highlights the dis¬ 
count role of t. For a constant step-size sequence q n = 7 , we have t n ^ m = 7 - (n — m), 
so the exponential sum in (3.4) means that (DXL) assigns (exponentially) more 
weight to recent observations rather than older ones. In a sense, this discount¬ 
ing counters the use of a vanishing j n . A decreasing step-size implies that more 
recent gradient observations enter the algorithm with a decreasing weight; by con¬ 
trast, the use of a positive discount parameter r > 0 tempers this (somewhat 
counter-intuitive) behavior by increasing the relative weight of more recent gradi¬ 
ent observations. Moreover, from a calculational standpoint, the use of a positive 
discount parameter t has the added benefit that the auxiliary score matrices Y (n) 
cannot grow too large. If the step-size sequence j n is chosen in a way such that the 
geometric series r m r tn ' rn+1 remains summable , 4 then Y(n) will be uniformly 

bounded on account of (3.4) and Assumption (Al). Since computing d 

Of course, in so doing, the discount parameter r also introduces a systematic 
deterministic bias to the gradient observations V(n), i.e. a perturbation that per¬ 
sists even in the noiseless regime where V(n) is actually deterministic. Indeed, if 
(DXL) is run with perfect gradient observations V(n) = V(X(n)), then any fixed 
point X* of (DXL) will satisfy: 

rY* = V(X*), 

x * = exp(Y*) (3.5) 

tr[exp(Y*)]' 

Setting V* = V(X*) for convenience and solving (3.5) for X* then gives: 

V* + rlogX* = —rlogtr [exp(—V*/r)]l, (3.6) 

or, after a slight rearrangement: 

V* + t (log X* + I) = —kI, 

1 We will elaborate more on the choice of 7.,, below. 


(3.7) 
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for k = t (1 + tr[exp(—V*/r)]). Importantly, the RHS of (3.7) can be written more 
simply as V* + r (logX* + I) = V F r (X*) where the perturbed objective function 
F t : X —> ffi. is defined as: 

F r (X) = F(X) + MX), (3.8) 

with 

h(X) = tr[XlogX] (3.9) 

denoting the so-called von Neumann (or quantum) entropy of X [42]. 5 Thus, given 
that X* must satisfy the trace constraint tr(X*) = 1, it follows that any fixed point 
X* of (DXL) will be a solution of the perturbed optimization problem: 

minimize F(X) + r tr [X log X], 
subject to X £ X. ^ 

Obviously, the solution set of (SP T ) is asymptotically close to that of the un¬ 
perturbed problem (SP) in the limit r —> 0 (where the entropic perturbation term 
hfX.) vanishes): more precisely, if X* is a solution of (SP T ), there exists a solution 
X* of (SP) such that the distance between X* and X* vanishes as r —> 0. That 
said, an important difference between (SP) and (SP T ) is that the latter is strictly 
convex (because h is). As a result, (SP r ) admits a unique solution, even when the 
solution set of (SP) is a non-singleton convex set. 

With all this in mind, we are in a position to state our main result for (DXL): 

Theorem 3.1. Assume that (DXL) is run with gradient observations satisfying 
(Al) and with a variable step-size sequence such that 7 n < 7 n = 

oo. Then, the iterates X(n) of (DXL) converge almost surely to a solution of the 
perturbed optimization problem (SP T ); in particular, X(n) converges ( a.s .) within 
e(t) of a solution of the stochastic problem (SSP) and the error e(r) vanishes in 
the limit r —> 0. 

Theorem 3.1 will be our main result for (DXL) so, before proving it, some remarks 
are in order: 

Remark 3.1. The statement of Theorem 3.1 suggests that the discount parameter 
r should be taken as small possible in order to ensure the algorithm’s convergence 
to a state X* £ X that is as close as possible to the solution set of (SSP). On the 
other hand, very small r > 0 could mean that the iterates Y(n) of (DXL) could 
grow quite large, potentially exceeding the numerical capacity of the optimizing’s 
agent calculating device - recall the discussion surrounding (3.4). As a result, the 
discount parameter r > 0 essentially reflects the algorithm’s accuracy vs. memory 
trade-off: lower values of r > 0 lead to better solutions of (SSP), but at the expense 
of higher memory requirements and more processing power. Ultimately, the choice 
of t relies on the technical specifications of the optimization problem to be solved 
so the “optimal” choice of r can only be made on a case-by-case basis. 

Remark 3.2. In a similar vein to the above remark, Assumption 1 can actually 
be relaxed to account for gradient observations that are only bounded in mean 
squre (instead of being bounded almost surely). In this context however, a given 
observation V of V could exceed the storage/processing capacity of the agent’s 

'"’That the gradient of F T is V />(X) = V(X) + r(logX + I) follows from standard arguments 
in matrix calculus - see e.g. [13, Appendix D[. 
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optimizing device, thus introducing additional arithmetic stability errors to running 
(DXL). Such issues lie beyond the scope of the current work so we opted to work 
with the almost sure boundedness assumption for simplicity. 

Remark 3.3. We should also note here that the “£ 2 — I ln summability condition 
EZiln < EZ ^ 7 n = oo can also be relaxed in the context of Assumption 1. 
Specifically, Theorem 3.1 remains true even with significantly more aggressive step- 
size sequences of the form = n~ a for some arbitrarily small a > 0. The reason 
for stating (and proving) Theorem 3.1 in the “£ 2 framework was only done for 
simplicity; in practice, the use of a (nearly) constant step-size greatly accelerates 
the algorithm, a fact that we explore in Section 4. 

Now, to prove Theorem 3.1, our strategy will be as follows: First, we will 
show that the iterates of (DXL) constitute a so-called asymptotic pseudotrajec¬ 
tory (APT) of the mean, continuous-time dynamics: 

Y = -V(X) - rY, 

_ exp(Y) (DXL C ) 

tr[exp(Y)] ’ 

i.e. the iterates of (DXL) are asymptotically close to solution segments of (DXL C ) 
of arbitrary length [7]. We will then show that (DXL C ) converges to the (unique) 
solution of the perturbed optimization problem (SP T ); the claim of (3.1) will then 
follow from standard results in the theory of stochastic approximation [7]. 

We begin by showing that the iterates of (DXL) comprise an asymptotic pseu¬ 
dotrajectory of the dynamics (DXL C ) in the sense of [7], i.e. 

lim sup ||X(f + h) — $h(X(t))|| = 0 (a.s.), (3.10) 

t^o°C)<h<T 

where X(t), t > 0 is the linear interpolation of the iterates X(n) of (DXL) while 
$t(X) denotes the flow induced on X by (DXL C ) - i.e. 4> t (X), t > 0, is the solution 
trajectory of (DXL C ) that starts at X £ X. To that end, we will first need the 
following boundedness result: 

Lemma 3.2. If < 1/r for all sufficiently large n, then the iterates Y (n) of 
(DXL) under Assumption 1 are bounded (a.s.). 

Proof. First, let V >0 be such that ||V(n)|| < V almost surely (that such a V exists 
is a consequence of Assumption 1); additionally, let no be such that 0 < 1 — 7 n r < 1 
for all n > no- Then, for n > no, the definition (DXL) of Y(n) and the bound 
||V(n)|| < V readily yield ||Y(n + 1)|| < (1 — T 7 „)||Y(ra)|| + j n V. We are thus 
reduced to the following cases: 

. If r||Y(n)|| > V, then ||Y(n + l)|| < ||Y(n)||+ 7 n (^-r||Y(n)||) < ||Y(n)||, 
so Y (n) decreases in norm. 

• Otherwise, if r||Y(n)|| < V, we will have ||Y(n + 1)|| < (1 — r ) n T W/ T + 
InV = V/t. 

It follows that ||Y(n + 1)|| will either decrease or be uniformly bounded by V, so 
our claim follows by induction. □ 

Thanks to this lemma, we readily obtain: 

Proposition 3.3. With notation as in Lemma 3.2, the sequence Y (n) comprises 
an asymptotic pseudotrajectory of (DXL C ). 
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Proof. First, taking expectations in the RHS of (DXL) yields: 

E[Y(n) - 7 n(V(n) + tY( n)) | F n ] = Y(n) - 7n (V(X(n)) + rY(n)), (3.11) 

where we used Assumption 1 and the fact that Y (n) and X(n) are fully deter¬ 
mined by J~ n - Since Y (n) is bounded (a.s.) by Lemma 3.2, our claim follows from 
Proposition 4.1 in [7]. □ 


We now proceed to show that the dynamics (DXL C ) converge to the (unique) 
solution of the perturbed optimization problem (SP r ) from any initial condition 
Y (0). To that end, we will first need to derive the dynamics of the primal control 
variable X(f): 


Lemma 3.4. Let X(t) be a solution orbit of the continuous-time dynamical system 
(DXL C ). Then, X(£) satisfies the dynamics: 

X=-/ X 1-S V T (X)X S ds + tr[XV T (X)]X, (3.12) 

J o 


where 

V r (X)=V(X)+T logX. (3.13) 


Proof. Let Z(Y) = tr[exp(Y)]. Then, differentiating X(f) with respect to t, we 
get: 


X = 


1 d exp(Y) • 

Z(Y)d, eX P< Y) -—^ 


1 


Z(Y) 


Z 2 ( Y) 
e (i-s) Y Y e sY ds- 


Z 2 (Y) 


' tr[Ye^ 


= -/ X 1_S V T (X)X S ds + tr[XV r (X)]X, 


(3.14) 


Jo 

where the second equality is an application of Frechet’s derivative formula for matrix 
exponentials [16] and the last one follows by recalling that X = exp(Y)/tr[exp(Y)] 
so Y = -V(X) - r(logX + Z(Y)I) = -V T (X) - rZ(Y)I by the definition of the 
dynamics (DXL C ) . □ 


With this explicit expression for the evolution of X at hand, we are almost 
in a position to show that the perturbed objective function iq-(X) = F(X) + 
rtr[XlogX] of (SP T ) is a strict Lyapunov function for the dynamics (3.12). The 
only other result that we will need is the following Jensen-like inequality for positive- 
definite matrices: 


Lemma 3.5. Consider Hermitian matrices W, X € Hm with X >- 0 and tr(X) = 
1. Then, for all s € [0,1], we have tr(X 1 - s WX s W) > tr(XW) 2 with equality if 
and only ifW oc I. 

Proof. Let a = (1 - s)/2, b = s/2, and set A = X 1 / 2 , B = X Q WX h . Then, the 
Cauchy-Schwarz inequality for matrices gives tr(AAT) tr(BB^) > |tr(ABT)| 2 with 
equality iff A oc B. On the other hand, we also have tr(AA’l') = trX = 1 and 
tr(BB t ) = tr[X a WX h X b WX a ] = tr[X 1 " s WX s W], leading to the inequality: 


1 • tr[X 1-s WX s W] > 


tr[X 1/2 X s/2 WX (1 " s)/2 ] 


= |tr(XW)| 2 = tr(XW) 2 , 

(3.15) 


2 
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where the last equality follows from the fact that tr(XW) is real (recall that X is 
positive-definite while W is Hermitian). This inequality holds as an equality if and 
only if X 1 / 2 oc X“WX‘ so, with a + b = 1/2, this last condition is equivalent to 
W oc I, as claimed. □ 

With all this in hand, we obtain: 

Proposition 3.6. Let X(f) be an interior solution orbit of the continuous-time 
dynamics (DXL C ). Then, ^FV(X(f)) < 0 for all t > 0, with inequality if and only 
if V T (X(t)) oc I - i.e. at interior stationary points of (3.12). 

Proof. By a simple application of the chain rule, we readily get: 

Fr =tr[XVF r (X)] = — tr[X • (V(X) + r(log X + I))] = — tr[XV T (X)], (3.16) 

where we have used the definition of V T and the fact that tr[X] = 0 (since tr[X(t)] = 
1 for all t > 0). Invoking Lemma 3.4, we then obtain 

F t = f tr [X 1 - s V r (X)X s V r (X)] ds - tr[XV r (X)] 2 
Jo 

i 

tr [X 1 - s V r (X)X s V T (X)] - tr[XV T (X)] 2 ds, (3.17) 

and our assertion follows from Lemma 3.5 above. □ 

As a corollary of the above, we then get: 

Corollary 3.7. For every initial condition Y(0) € PLm, the dynamics (DXL C ) 
converge to the unique solution X/ of the perturbed optimization problem (SP T ) 

Finally, we have: 

Proof of Theorem 3.1. By Proposition 3.3, the iterates of (DXL) form an asymp¬ 
totic pseudotrajectory of the continuous-time dynamical system (DXL C ). Since the 
objective function of the perturbed optimization problem (SP r ) is a strict Lyapunov 
function for the latter (Proposition 3.6 coupled with the fact that any solution of 
(SP r ) is interior), our claim follows readily from standard stochastic approximation 
results [7, Theorem 5.7]. □ 

From the proof of Theorem 3.1, we can identify two points where the positivity 
of r plays a crucial role. The first is the boundedness of the iterates Y(n) of the 
algorithm (Lemma 3.2) which guarantees that (DXL) is a stochastic approximation 
of the mean dynamics (DXL C ). The second is the fact that the problem (SP T ) 
admits a unique, interior solution. In the limit case r = 0, it is still possible to 
show that (DXL) comprises an asymptotic pseudotrajectory of (DXL C ) but the 
Lyapunov argument of Proposition 3.6 is more subtle. Since we are only interested 
in algorithms with finite iterates (for computer arithmetic reasons), we will not 
press this issue further, delegating it instead to future work. 

3.2. Distributed optimization in asynchronous multi-agent environments. 

Of course, even though the information requirements of (DXL) are relatively mini¬ 
mal (an imperfect oracle call to the gradient of the agent’s stochastic objective), it 
is not clear whether it can be readily extended to a distributed optimization setting 
(or a game-theoretic context) where agents update independently of one another 
and there is often a delay between agent updates and observations. To overcome 
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Algorithm 2 Asynchronous implementation of (DXL). 

Parameters: discount rate r > 0; initial step-size 7. 
Initialize: n t— 1 ; Y •<— 0; X <— exp(Y)/tr[exp(Y)]. 

Repeat 

UpdateEvent occurs at time r(n); 

get gradient estimate V; 

update score matrix: Y •<— Y + 7 /nV; 

update primal variable: X <— exp(Y)/ tr[exp(Y)]; 

n n + 1; 

until termination criterion is reached. 


these limitations, we examine here a fully decentralized variant of (DXL) which 
addresses the issues above. 

To make all this precise, we will work with the multi-agent stochastic optimiza¬ 
tion problem (2.4) and we will assume that the agents seek to converge to a solution 
thereof through repeated play. To that end, let n denote the n-th overall update 
epoch in the system, let IC n C 1C denote the subset of agents who update at this 
epoch (typically \K n \ = 1 if agents update at random times), and let dk(n) be the 
number of periods that have elapsed at period n since the last update of agent 
k. With all this in mind, we will focus on the following asynchronous variant of 
(DXL): 


Y k (n+ 1) = Y fc (n) - 7„ fe 1 (k e K, n ) ■ (V k (n) + rY t (n)), 


Xfc(n+ 1) 


ex P(Yfc(n + 1)) 
tr[exp(Y fc (n + 1))] ’ 


(3.18) 


where n k = Y'.j—i 1 (k £ ICj) denotes the number of updates performed by agent 
k up to epoch n while the (asynchronous) gradient estimate ~V k (n) satisfies the 
unbiasedness assumption: 


E [Vfc(n) | Tn\ = V k (X 1 (n-d 1 (n)),...,X K (n-d K (n))), (AT) 

where, as before, V fc (Xi,... ,X n ) = V X|i F(X 1 ,.. .,X K ). 

By definition, Y k(n) and X k (n) are updated at the ( n + l)-th update period if 
and only if A; £ IC n , so every agent follows his individual update timer, indepen¬ 
dently of what other agents in the system do (for a pseudocode implementation, 
see Algorithm 2). Remarkably, in this completely decentralized context (with out- 
of-date and/or imperfect gradient observations), we still get: 


Theorem 3.8. Assume that the agents’ delay process dk{n) are bounded ( a.s .) and 
the set of agents lC n that updates at the n-th overall update epoch is a homogeneous 
recurrent Markov chain - i.e. all agents update a strictly positive rate. Assume 
further that Algorithm 2 is run with step-sizes 7 „ oc 1 jn and imperfect gradient 
estimates Vfc(n) satisfying the unbiasedness assumption (AT). Then, the algo¬ 
rithm’s iterates converge (a.s.) to the (unique) minimizer of the perturbed objective 
F r (Xu • • •, X K ) = F(X U .. .,X K ) + T £f =1 tr[X, logX fc ] over X = nf =1 X k . 
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In particular, Algorithm 2 converges within e{r) of a solution of the distributed 
stochastic optimization problem (2.4) and the approximation error e(r) vanishes as 
T —> 0 + . 

Proof. Following Theorems 2 and 3 in [8], the asynchronous recursion (3.18) may 
be seen as a stochastic approximation of the rate-adjusted dynamics: 

Y k = -p k [V k +rY k ], (3.19) 

where pk = limn^oo nk/n > 0 is the asymptotic update rate of user k (the existence 
and positivity of this limit follows from the ergodicity assumption on the set-valued 
process KL n ). This multiplicative factor does not alter the rest points of the original 
dynamics (DXL C ) and an easy calculation shows that the perturbed objective func¬ 
tion F r (X|,... ,Xjf) remains a strict Lyapunov function for the rate-adjustment 
dynamics above. The rest of our proof then follows essentially as that of Theorem 
3.1. ' □ 


4. Applications to Wireless Networks 

We now turn to a concrete application of the algorithmic framework presented in 
the previous sections to distributed throughput maximization in multi-user wireless 
systems. 

4.1. Problem formulation. Throughout this section, we will focus on mobile sys¬ 
tems where a set K. = {1,..., K} of different transmitters (or users ) communicate 
simultaneously with a single receiver (for instance, a base station or a wireless ter¬ 
minal). Following recent developments in wireless communication technology [1-3], 
we will further assume that each user k G 1C is using Mk antennas for transmission 
(multiplexing) while the receiver is using N antennas for signal reception and de¬ 
coding. More precisely, what this means is that the aggregate signal reaching the 
receiver can be described by the standard channel model 

K 

y = HfcXfc + z (4.1) 

k =1 

y € C N is the aggregate signal reaching the receiver (the channel’s output). 
Xfc £ C Mk is the transmitted signal (input) of the fc-th transmitter (the 
channel’s input). 

Hfc £ C NxMt denotes the transfer matrix between the A;-th transmitter and 
the receiver, representing how the transmit signal is affected by the wireless 
medium. To account for channel fading, we will assume in what follows that 
the users’ channel matrices evolve over time following a bounded stationary 
process [15] and we will denote expectations over this distribution by E[-]. 6 
z £ C N is the noise in the channel (including thermal, atmospheric and 
other peripheral interference effects). Following standard information-the¬ 
oretic caveats, we will further assume that z can be modeled as a circularly 
symmetric, zero-mean Gaussian vector with unit covariance [11, 40, 44]. 


6 As a special case, in the static channel regime, we will assume that this process is, in fact, 
deterministic. 


where: 

(1) 

( 2 ) 

( 3 ) 


( 4 ) 
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This multiple-input and multiple-output (MIMO) multiple access channel (MAC) 
model has attracted considerable interest in the literature [5, 6, 11, 28, 33, 36, 40, 44] 
and it is well known that the users’ maximum transmission rate is achieved using 
random Gaussian codes for signal encoding.' Specifically, let 

Qfc = E cb [x fc xj’] (4.2) 


denote the covariance matrix of the transmitters’ input signal distribution, with 
the expectation E c b being taken over the user’s input codebooks. 8 Then, assuming 
perfect channel state information (CSI) at the receiver, the maximum achievable 
information transmission rate of user k will be given by the familiar expression 
[15, 40]: 


Rk( Q)=E logdet (i + J2 e HfQ^Hf) - logdet W_ fc (Q_ fe ) , (4.3) 


where Q = (Qi,...,Qa) denotes the users’ covariance profile, Q_fc is the corre¬ 
sponding profile for all users except k, the expectation E]-] is taken over the users’ 
channel law, and 

W_ fe (Q_ fe ) = I + (4.4) 

denotes the multi-user interference-plus-noise (MUI) of user k.' ] 

In this context, the users’ objective is to select input signal covariance matrices 
Qfc so as to maximize their individual information transmission rate Rk{ Q) subject 
to the constraints 

tr(Qfc) = -Pfc, (4.5) 

where Pfc is the transmit power of user k [40, 44]. More formally, in the language 
of Section 2.2, the above boils down to the rate maximization game: 

maximize unilaterally rtfc(X) for all k £ /C, 

subject to Xfc 0, tr(Xfc) = 1, (^ ) 


where, for convenience, we have set Xfc = Qfc/Pfc and the users’ utility function Uk 
is simply defined as: 

«fc(X 1} ..., X K ) = Pfc(PiX 1; ..., P K X K ). (4.6) 

Clearly, in the presence of fading, the users’ objectives are stochastic in nature 
because of the expectation over H in (4.3); otherwise, in the case of static channels, 
this expectation is trivial, so (RM) is deterministic. As a result, the game-theoretic 
problem (RM) can be seen as a special case of the multi-agent stochastic problem 
(SSP). Indeed, as was shown in [6], the users’ reward functions Uk satisfy the 
potential property [23]: 

u fe (X fc ; X_fc) - U fc(X',; X_fc) = - [F(X fe ; X_ fc ) - F(X',; X_ fc )] (4.7) 


^Depending on the structure of the channel matrices H&, the channel model (4.1) actually 
applies to several telecommunications systems, ranging from digital subscriber line (DSL) uplink 
networks with Toeplitz circulant Hfc, to code division multiple access (CDMA) radio networks 
[37]. For concreteness, we will stick here with the interpretation of the signal model (4.1) as an 
ad hoc multi-user MIMO multiple access channel with H& representing the channel of each link. 

importantly, the expectation E c b[*] is not related to the expectation E[*] taken over the dis¬ 
tribution of the users’ channels. 

^From an information-theoretic perspective, we are also assuming single user decoding (SUD) 
and perfect CSI at the receiver; for a more detailed account, see e.g. [33—35, 40, 44]. 
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where the game’s potential function F is defined as: 


F(X) = - E 


logdet (l + £*^H fc X fc HjQ 


(4.8) 


and the problem’s feasible region is x — Tlfc x k with X k = {X fc <E 7 iu k : X k !>= 
0 and tr(Xfc) = 1}. 1U 

Since the function M ha logdet (I + M) is concave in M over the entire cone 
of positive-semidefinite matrices [9], it follows that F is itself convex over AT. 11 
Accordingly, the rate maximization game (RM) falls squarely in the framework of 
Section 2.2: in realistic network situations, the distribution of the users’ channel 
matrices is not known to the users, so the rate functions R k (or the game’s potential 
F) cannot be calculated a priori. As a result, to reach a Nash equilibrium of (RM), 
the system’s users cannot rely on gradient observations of R k (or F), but only on 
stochastic (and possibly imperfect and/or delayed) information on the quantities 
inside the expectation of (4.3) and (4.8), themselves obtained through an interplay 
between the transmitters and the receiver. 

The framework described above naturally calls for a distributed solution method, 
so the algorithmic material of Section 3 seems particularly well-suited to the occa¬ 
sion. In the rest of this section, we will describe the specifics of this application. 


4.2. Algorithmic implementation — synchronous updates. The first step re¬ 
quired to apply the algorithmic tools of Section 3 is to calculate the stochastic 
gradient of the users’ rate functions R k . To that end, let 

r fc (X) = logdet(H-^ / P / H / X / Hj)-logdet(I + ^ P^H/X/Hj), (4.9) 

so itfc(X) = E[rfc(X)]. Then, some matrix calculus readily yields 

V Xfc r*(X) = W -1 (X; H)h£, (4.10) 

where, in a slight abuse of notation, 

W(X;H)=I + X) / P / H<X / Hj (4.11) 

denotes the aggregate signal covariance matrix at the receiver. Thus, if H(n) de¬ 
notes the realization of the users’ channel matrices at each update period n = 
1 , 2 ,..., and X(n) is their corresponding transmit profile, we will assume that 
a) Hfc(n) is measured at each transmitter k G 1C; and b ) W(X(n); H(n)) is measured 
at the receiver and is then broadcast to the transmitters. Under these assumptions, 
each transmitter k G K, can recreate their individual (stochastic) gradient matrices 
at period n as: 

V*(n) = H t fe (n)W~ 1 (X(n);H(n))H t fe (n), (4.12) 

and, by construction, we will have: 

E [Vfc(n)] = V Xfc( „) P fc (X(n)), for all k = l,...,K. (4.13) 


^From an information-theoretic perspective, F simply represents (minus) the users’ sum rate 
under a centralized successive interference cancellation (SIC) decoding scheme [44], As a result, 
the above rate maximization problem can be seen both as a game (under single user decoding) or 
as a distributed, multi-agent optimization problem (under more sophisticated SIC schemes). For 
a more detailed discussion, see [6, 12, 18, 19, 21] and references therein. 

■^In fact, if the law of the users’ channel matrices does not contain any atoms, F is actually 
strictly convex. 
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With this in mind, Algorithm 1 provides the following rate maximization algo¬ 
rithm with synchronous updates (SU): 


Algorithm 3 MIMO rate maximization with synchronous updates (SU) 

Parameters: discount parameter r > 0; decreasing step-size sequence j n . 

n 4— 1; 

foreach transmitter k £ 1C do 

initialize Hermitian score matrix Y k £ TLm k \ 

Repeat 

n •<— n + 1; 

Receiver measures and broadcasts P = W _1 ; 
foreach transmitter k £ K, do 
Measure channel matrix HU; 

Update score matrix Y k £- Y k + 7 „ ^H fc W~ 1 H|, - rYtj; 

exp(Y fe ) 

Update covariance matrix X& i - - - _ . 

tr[exp(Y fc )J 

until termination criterion is reached. 


From the point of view of distributed implementation, Algorithm 3 has the fol¬ 
lowing desirable properties: 

(PI) It is distributed : users only update their individual variables using the same 
information as in distributed water-filling (namely the broadcast of W _1 ) 
[33-35, 44], 

(P2) It is stateless : users do not need to know the state of the system (or the 
existence of other users). 

(P3) It is reinforcing: users tend to increase their individual transmission rates u k . 
(P4) It is stable: the matrix exponentials can be calculated in a numerically stable 
and efficient manner [22]. 

Furthermore, since the users’ channels are bounded by necessity, Theorem 3.1 read¬ 
ily yields: 

Corollary 4.1. Assume that Algorithm 3 is run with a step-size sequence such 
that 7 1 < En=l = oo. Then, the algorithm’s iterates converge (a.s.) 

within e(r) of a Nash equilibrium of the rate maximization game (RM) and the 
approximation error e(r) vanishes as r —» 0 + . 

4.3. Asynchronous implementation. Let us now consider a more realistic wire¬ 
less environement where the transmitters do not share a common update clock - 
so synchronous decisions are not possible. In this context, the synchronous update 
structure of Algorithm 3 is no longer appropriate, so we will employ Algorithm 2 
(which is fully decentralized) instead. 

To that end, assume that each transmitter is equipped with an individual timer 
Tfc whose ticks indicate the update events of user k. More precisely, we assume here 
that r k : N —> R + is an increasing (and possibly random) sequence such that T k {n) 
marks the instance at which the £:-th user updates his covariance matrix X;,. for the 
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n-th time - so X;,. does not change between Tf~{n ) and Tk(n+1). Similarly, we assume 
that the receiver is equipped with a timer t 0 (n) that triggers the measurements of 

W (t) = W(X(t); H(t)) = I + E, fyH £ (t)X e (t)Hj(t). (4.14) 

Thus, at every tick of t*,, user k measures and updates X*, while, at every tick 
of To, the receiver measures and broadcasts W. This asynchronous operating mode 
fits naturally within the framework of Algorithm 2, leading in turn to the following 
implementation of Algorithm 3 with asynchronous updates (AU): 


Algorithm 4 MIMO rate maximization with asynchronous updates (AU) 


Parameters: discount parameter r > 0; 

n 4— 1 ; 

Initialize Hermitian score matrix Y. 

Repeat 

UpdateEvent occurs at time r(n); 

n <— n + 1; 

Measure channel matrix H; 

Recall latest broadcast of W; 

Update score matrix Y <— Y+ £ (HW^H^rY); 


exp(Y) 

Update covariance matrix X i - - --——; 

tr[exp(Y)] 

until termination criterion is reached. 


Algorithm 4 is run independently by each transmitter - though, of course, if all 
transmitters share a common timer, Algorithm 4 reduces to the synchronous context 
of Algorithm 3. Moreover, provided that all individual timers Tk have positive finite 
rate (i.e. lim Tfc(n)/n exists and is finite), it is easy to see that the update sequence 
generated by Algorithm 4 satisfies the assumptions of Theorem 3.8. Indeed, the set¬ 
valued process K. n used in (3.18) to indicate the set of transmitters updating their 
covariance matrices at the n-th overall update event may be obtained from the users’ 
individual timers t* as follows: First, let KL{t) = {k £ K, : Tk(n) = t for some n £ N} 
denote the set of players updating at time t and let n(t) = card{s < t : /C(s) ^ 0} 
be the total number of update epochs up to time t. Then, K. n = /C(inf{i : n(t) > n}) 
and ?7fc(n) = ^ £r), so the limit lim Tk{n)/n exists and is finite if and 

only if the limit lim rik(n)/n exists and is positive. With all this in mind, we readily 
obtain: 

Corollary 4.2. The iterates of Algorithm 4 converge ( a.s .) within e{t) of a Nash 
equilibrium of the rate maximization game (RM) and the approximation error e(t) 
vanishes as r —> 0 + . 

4.4. Learning with imperfect information. As a final application of the algo¬ 
rithmic framework of Section 3 to the problem at hand, we turn to the case where 
the users’ channel matrices EU are static but channel and interference measure¬ 
ments are subject to observation noise and measurement errors. In this case, the 
rate maximization game (RM) becomes deterministic but the system is still subject 
to stochasticity originating from noise and uncertainty in the users’ measurements. 
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In the perfect information case, the (deterministic) semidefinite problem (RM) 
may be solved by water-filling techniques [11], properly adapted to multi-user en¬ 
vironments [32, 33, 44]. Such methods can be either iterative (with users updating 
their covariance matrices one after the other, in a round-robin fashion) [44] or simul¬ 
taneous (with users updating all at once) [32]. The benefit of the former (iterative) 
scheme is that its convergence is guaranteed [44]; however, the algorithm’s conver¬ 
gence rate is inversely proportional to the number of users in the system (making 
such methods unsuitable for large networks). On the other hand, simultaneous 
water-filling methods are faster [32], but their convergence is conditional on certain 
“mild intereference” conditions which fail to hold even in very simple 2x2 systems 
[20]. Making matters worse, water-filling methods rely on perfect channel state 
information at the transmitter (CSIT) and perfect measurements of the output sig¬ 
nal covariance matrix W at the receiver; when it is impossible (or impractical) to 
obtain such noiseless measurements, it is not known whether water-filling methods 
converge. 

In light of the above, our goal here will be to provide a viable alternative to water¬ 
filling based on (DXL). To that end, we will focus on two sources of measurement 
noise: 

(1) The (static) transfer matrices HR can only be measured at the transmitter 
up to some random observational error. 

(2) The receiver can only estimate the covariance W of the aggregate received 
signal y via random sampling (assumed to occur between the updates of 
the transmitters). 

Even though these two randomness sources are independent of one another, the 
gradient matrices VR = HfcW -1 !!], of (4.12) depend nonlinearly on HR and W, so 
care must be taken to construct an unbiased estimator of \R from noisy estimates 
of HR and W. 

We first consider the random perturbations induced on the estimation of W _1 
by signal sampling at the receiver end. On that account, recall that W is simply 
the covariance matrix of the aggregate received signal y £ C N : 

E[yyt] = E [zz+] + ^ H fc E [x fc x£] Hj. = I + ^ U k Q k U{ = W. (4.15) 

As a result, an unbiased estimate for the covariance W of y may be obtained from a 
systematically unbiased sample yi,..., ys of y by means of the classical estimator 

w = 5- 1 Eliy,y|- 12 

On the other hand, given that W” 1 is a biased estimator of W _1 (and hence 
introduces a systematic error to the measurement process) [4], we cannot use this 
classical covariance estimate for the received signal precision (inverse covariance) 
matrix W _1 . Instead, following [4], an unbiased estimate of the precision matrix 
P = W^ 1 of y is given by the corrected expression: 

P = S-^ W- 1 , (4.16) 


12 Since E c u[y] = 0, we do not need to include an .S '/(S — 1) bias correction factor in the 
estimate of W. Also, in a slight abuse of notation, the measurement expectations here are taken 
with respect to the law of x, y, and z. 
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where W = <S'~ 1 ]Cf=i ys>4 as before. Thus, to obtain W _1 , the receiver only 
needs to take S > N + 1 independent measurements of y and then broadcast the 
unbiased estimate P of W” 1 to the network’s users. 

Similarly, in the absence of perfect channel state information at the transmitter, 
the users must obtain an unbiased estimate of the unilateral gradient matrices V*, = 
Hj,W _1 Hfe from the broadcasted value of W and imperfect measurements of their 
channel matrices H^. However, an added complication here is that the estimated 
matrix must be itself Hermitian - otherwise, need not be positive-definite 
and the DXL scheme may fail to be well-posed. To accommodate this requirement, 
if each transmitter takes S > 1 independent measurements Hfc.i,..., H^.S of their 
individual channel matrix H^, such an estimator is given by the expression: 




(4.17) 


where P is the broadcast estimate (4.16) of W 4 . Indeed, given that the sample 
measurements are assumed stochastically independent, we will have: 

= g ( g 1 _ !, X,*,, E I*lj E EP] EIHl.J = Htw-'H*. (4.18) 

where we have used the independence of the samples to decorrelate the expectations 
in the second equality, and we relied on the unbiasedness of P and H*, for the last 
one. Thus, with E[Vj.] = V*., our construction of an unbiased estimator for is 
complete. 

From an implementation viewpoint, the above leads to the following distributed 
operation protocol. First, with notation as in the previous section, let tq denote 
the receiver’s measurement timer (so tq(ji) is the n-th instance in time at which the 
receiver measures and broadcast W~ 4 ). Then, at each tick of r 0 , the receiver takes 
a sample of the received signal y of size S > M + 1 and computes the estimate 
P(ro(n)) as above. 13 Likewise, if is the update timer of user k (so Tfc(n) is the 
n-th update time for user k), each transmitter k £ K, is assumed to measure his 
individual channel matrix and calculate his gradient estimate V*, using the recipe 
(4.17) with the latest broadcasted value of P. 14 Theorems 3.1 and 3.8 then yield: 


Corollary 4.3. With notation as before, the iterates of Algorithm f with imperfect 
feedback converge (a.s.) within e(t) of a Nash equilibrium of the rate maximization 
game (RM) with static channels; moreover, the approximation error e(r) vanishes 
as t —} 0 4 ". 


4.5. Numerical results. To assess the performance of (DXL) applied to realistic 
network conditions, we simulated in Fig. 1 a multi-user uplink MIMO system con¬ 
sisting of a wireless base receiver with 5 antennas and K = 25 transmitters, each 
with a random number mk of transmit antennas picked uniformly between 2 and 


13 We implicitly assume here that this measurement process takes a negligible amount of time. 
This assumption is justified by the fact that the characteristic time at which the receiver estimates 
W for decoding purposes is much shorter than the interval between user updates. 

14 Obviously, if to = 77 , for all k G /C, the above process boils down to the synchronous regime 
of Algorithm 3. 
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Figure 1. The effect of the discount parameter r on the end-state 
of the discounted exponential learning algorithm (Algorithm 3). 

6. 15 For the static channel case, each user’s channel matrix H*, was drawn from 
a complex Gaussian distribution at the outset of the transmission (but remained 
static once chosen), and Algorithm 3 was ran with a large constant step size for 
different values of the discount parameter r. The performance of the algorithm 
over time was then assessed by plotting the normalized efficiency ratio 

rr / \ A(n ax F n . 

eff ( n ) = p—( 4 - 19 ) 

where F n denotes the users’ sum rate at the ri-tli iteration of the algorithm, and 
F max (resp. F m i n ) is the maximum (resp. minimum) value of F over the system’s 
set X of feasible covariance matrices. Thus, by definition, an efficiency measure of 
1 corresponds to a Nash equilibrium of the rate maximization game (RM) while an 
efficiency ratio of 0 means that the system is very far from equilibrium. * 1 * *’ In tune 
with Theorem 3.1, Fig. 1 reveals that Algorithm 3 converges within a few iterations 
(effectively, within a single iteration for low r), but the end value of the users’ sum 
rate deteriorates for higher values of the discount parameter r. 

In Fig. 2, we fix the algorithm’s discount parameter to a low level (r = 10~ 3 ) 
that ensures effective convergence to Nash equilibrium, and we investigate the al¬ 
gorithm’s convergence speed as a function of the number of transmitters, using 
existing water-filling methods as a benchmark. Specifically, in Fig. 2(a), we ran Al¬ 
gorithm 3 for a multi-user uplink MIMO system with K = 10, 25, 50 and 100 users 
using a large, constant step size; as a result of this parameter tuning, Algorithm 3 
effectively attains the system’s sum capacity within one or two iterations, even for 
large numbers of users. Importantly, as can be seen in Fig. 2(b), this represents a 
marked improvement over water-filling methods, even in moderately-sized systems 

1 J For simplicity, throughout our numerical simulations, we focused on the synchronous updates 
case (Algorithm 3). 

1 *’The reason for using this efficiency measure instead of the user’s sum rate F directly, was 

to eliminate any scaling artifacts arising e.g. from F taking values in a very narrow band close to 

its maximum value. 
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System efficiency over time (static channels) 



(a) Convergence speed of Algorithm 3 for differ¬ 
ent numbers of users. 


System efficiency over time (static channels) 



5 10 15 20 25 30 35 

Iterations 


(b) Discounted exponential learning vs. water¬ 
filling for K = 25 users. 


Figure 2. The convergence speed of discounted exponential learn¬ 
ing with synchronous updatess as a function of the number of users. 


with K = 25 users: on the one hand, iterative water-filling (IWF) [44] is signifi¬ 
cantly slower than SU (it requires O(K) iterations to achieve the same performance 
level as the first iteration of Algorithm 3), whereas simultaneous water-filling (SWF) 
[31] fails to converge altogether. 

The robustness of discounted exponential learning is investigated further in Fig. 3 
where we simulate an uplink MIMO system consisting of I\ = 25 transmitters with 
imperfect CSI and noisy measurements at the receiver. For simplicity, we modeled 
these errors as additive i.i.d. zero-mean Gaussian perturbations to the matrices 
Vfc = HfcW _1 Hj. that are used in the update step of SU, and the strength of these 
perturbations was controlled by the ratio of the errors’ standard deviation to the 
matrix norm of (so a relative error level of 77 = 100 % means that the measure¬ 
ment error has the same magnitude as the measured variable). We then plotted 
the efficiency ratio achieved by Algorithm 3 over time for average error levels of 
77 = 15% and 77 = 100%; for benchmarking purposes, we then also ran the iterative 
and simultaneous water-filling algorithms with the same relative error levels (and 
noise realizations). As can be seen in Fig. 3, the performance of water-filling meth¬ 
ods remains acceptable at low error levels (attaining 90-95% of the system’s sum 
capacity), but when the measurement noise gets higher, water-filling offers no per¬ 
ceptible advantage over the users’ initial choice of covariance matrices. By contrast, 
discounted exponential learning retains its convergence properties even for relative 
error levels as high as 100 % - though, of course, the algorithm’s convergence speed 
is negatively impacted. 

Finally, to account for changing channel conditions, we also plotted the perfor¬ 
mance of Algorithm 3 for non-static channels following the well-known Jakes model 
of Rayleigh fading [10]. More precisely, in Fig. 4, we consider a MIMO uplink sys¬ 
tem with 3 receive antennas and K = 10 users with 2 antennas each, transmitting 
at a frequency of f = 2 GHz and with average pedestrian velocities of v = 5 km/h 
(corresponding to a channel coherence time of 108 ms). We then ran Algorithm 3 
with an update period of 6 = 3 ms, and we plotted the achieved sum rate F(t) at 
time t versus the maximum attainable sum rate -F max (f) given the channel matri¬ 
ces Hfc(t) at time t (and versus the “uniform” sum rate that users could achieve 
by spreading their power uniformly over their antennas). As a result of its high 
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System efficiency over time (with measurement errors) System efficiency over time (with measurement errors) 




(a) Learning with a relative error level of 15%. (b) Learning with a relative error level of 100%. 

Figure 3. The robustness of entropy-driven learning in the pres¬ 
ence of measurement errors: in contrast to water-filling methods, 
the entropy-driven learning attains the channel’s sum capacity, 
even in the presence of very high measurement errors. 



Figure 4. The performance of entropy-driven learning under 
changing channel conditions (following the Jakes model for 
Rayleigh fading with parameters indicated in the figure caption). 


convergence speed, Algorithm 3 tracks the system’s sum capacity remarkably well, 
despite the changing channel conditions. Moreover, the sum rate difference between 
the learned transmit covariance profile and the uniform one shows that this tracking 
is not an artifact of the system’s sum capacity always being within a narrow band 
of its (evolving) maximum, but a real consequence of learning. 

5. Conclusions 

In this paper, we introduced a class of distributed algorithms based on a regular¬ 
ized variant of matrix exponential learning for stochastic semidefinite programming 
with applications to robust spectrum management in multi-user MIMO systems. 
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This adjustment of classical exponential learning generates a discrete-time algo¬ 
rithm which tracks the continuous-time dynamics of adjusted exponential learning 
and converges arbitrarily close to the system’s optimum configuration. Thanks to 
this adjustment term, the algorithm remains robust in the presence of stochastic 
perturbations: it converges even when the agents only have imperfect (or delayed) 
information at their disposal, or even if they update in a fully asynchronous manner 
and independently of one another. 

The optimization method of adjusted exponential learning method actually ap¬ 
plies to a wide range of semidefinite problems; we focused here on the MIMO MAC 
where our approach dominates classical water filling techniquesboth in terms of 
speed of convergence and robustness to random perturbations. 

In the case of multi-user MIMO systems, it out-performs traditional water-filling 
methods, both in terms of robustness to imperfect signal measurements and speed 
of convergence: in practice, the algorithme converges within a few iterations, even 
for large numbers of antennas. 
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